Overview

Dataset statistics

Number of variables12
Number of observations356
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)0.3%
Total size in memory57.2 KiB
Average record size in memory164.6 B

Variable types

NUM10
DATE1
CAT1

Reproduction

Analysis started2020-04-03 11:16:48.687076
Analysis finished2020-04-03 11:17:05.743195
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 1 (0.3%) duplicate rows Duplicates
enthalpy is highly correlated with temperature and 1 other fieldsHigh Correlation
temperature is highly correlated with enthalpy and 1 other fieldsHigh Correlation
dewpoint is highly correlated with humidity_absHigh Correlation
humidity_abs is highly correlated with dewpointHigh Correlation
density is highly correlated with temperature and 1 other fieldsHigh Correlation
cloudcoverage has 146 (41.0%) zeros Zeros

Variables

Distinct count355
Unique (%)99.7%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
Minimum2020-03-19 11:45:50
Maximum2020-04-03 12:56:50
Histogram

temperature
Real number (ℝ)

HIGH CORRELATION
Distinct count316
Unique (%)88.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.81761236
Minimum-3.77
Maximum16.02
Zeros1
Zeros (%)0.3%
Memory size2.9 KiB

Quantile statistics

Minimum-3.77
5-th percentile-1.97
Q11.8175
median4.755
Q37.4875
95-th percentile12.7275
Maximum16.02
Range19.79
Interquartile range (IQR)5.67

Descriptive statistics

Standard deviation4.307107022
Coefficient of variation (CV)0.894033538
Kurtosis-0.2360300687
Mean4.81761236
Median Absolute Deviation (MAD)3.415592413
Skewness0.2424348125
Sum1715.07
Variance18.5511709
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-3.77 2.67 7.65 10.21 16.02], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4.08 3 0.8%
 
4.94 3 0.8%
 
4.76 2 0.6%
 
-1.33 2 0.6%
 
9.63 2 0.6%
 
6.26 2 0.6%
 
4.71 2 0.6%
 
-1.71 2 0.6%
 
9.32 2 0.6%
 
4.38 2 0.6%
 
Other values (306) 334 93.8%
 
ValueCountFrequency (%) 
-3.77 1 0.3%
 
-3.72 1 0.3%
 
-3.66 1 0.3%
 
-3.43 1 0.3%
 
-3.04 1 0.3%
 
ValueCountFrequency (%) 
16.02 1 0.3%
 
15.84 1 0.3%
 
15.4 1 0.3%
 
15.19 1 0.3%
 
15.16 1 0.3%
 

humidity
Real number (ℝ≥0)

Distinct count71
Unique (%)19.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55.70786517
Minimum21
Maximum100
Zeros0
Zeros (%)0.0%
Memory size2.9 KiB

Quantile statistics

Minimum21
5-th percentile28
Q140
median55
Q372
95-th percentile87
Maximum100
Range79
Interquartile range (IQR)32

Descriptive statistics

Standard deviation19.23052283
Coefficient of variation (CV)0.3452030117
Kurtosis-1.032915701
Mean55.70786517
Median Absolute Deviation (MAD)16.4580861
Skewness0.1302029247
Sum19832
Variance369.8130084
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 21. 27.5 81.5 92.5 100. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
47 11 3.1%
 
75 11 3.1%
 
51 11 3.1%
 
56 10 2.8%
 
80 10 2.8%
 
64 9 2.5%
 
68 9 2.5%
 
35 9 2.5%
 
81 9 2.5%
 
70 9 2.5%
 
Other values (61) 258 72.5%
 
ValueCountFrequency (%) 
21 3 0.8%
 
22 5 1.4%
 
23 2 0.6%
 
24 2 0.6%
 
26 4 1.1%
 
ValueCountFrequency (%) 
100 1 0.3%
 
93 4 1.1%
 
92 3 0.8%
 
90 2 0.6%
 
89 3 0.8%
 

pressure
Real number (ℝ≥0)

Distinct count36
Unique (%)10.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1026.595506
Minimum1005
Maximum1041
Zeros0
Zeros (%)0.0%
Memory size2.9 KiB

Quantile statistics

Minimum1005
5-th percentile1010.75
Q11022
median1027
Q31032
95-th percentile1039
Maximum1041
Range36
Interquartile range (IQR)10

Descriptive statistics

Standard deviation8.24104342
Coefficient of variation (CV)0.008027546755
Kurtosis0.03822658905
Mean1026.595506
Median Absolute Deviation (MAD)6.397803308
Skewness-0.5099153962
Sum365468
Variance67.91479665
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1005. 1016.5 1023.5 1031.5 1033.5 1041. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1025 34 9.6%
 
1026 27 7.6%
 
1029 23 6.5%
 
1031 18 5.1%
 
1028 17 4.8%
 
1038 17 4.8%
 
1030 17 4.8%
 
1035 16 4.5%
 
1024 16 4.5%
 
1027 14 3.9%
 
Other values (26) 157 44.1%
 
ValueCountFrequency (%) 
1005 5 1.4%
 
1006 4 1.1%
 
1007 5 1.4%
 
1008 2 0.6%
 
1010 2 0.6%
 
ValueCountFrequency (%) 
1041 4 1.1%
 
1040 9 2.5%
 
1039 7 2.0%
 
1038 17 4.8%
 
1037 11 3.1%
 

windspeed
Real number (ℝ≥0)

Distinct count16
Unique (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.646348315
Minimum1
Maximum8.7
Zeros0
Zeros (%)0.0%
Memory size2.9 KiB

Quantile statistics

Minimum1
5-th percentile1.5
Q12.6
median3.1
Q34.6
95-th percentile7.2
Maximum8.7
Range7.7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.620020216
Coefficient of variation (CV)0.4442856459
Kurtosis0.3361131065
Mean3.646348315
Median Absolute Deviation (MAD)1.260833544
Skewness0.8800921533
Sum1298.1
Variance2.624465501
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 1.25 4.35 5.4 8.7 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3.1 63 17.7%
 
2.1 44 12.4%
 
3.6 42 11.8%
 
2.6 41 11.5%
 
4.1 40 11.2%
 
1.5 31 8.7%
 
5.1 22 6.2%
 
4.6 19 5.3%
 
6.7 13 3.7%
 
5.7 10 2.8%
 
Other values (6) 31 8.7%
 
ValueCountFrequency (%) 
1 4 1.1%
 
1.5 31 8.7%
 
2.1 44 12.4%
 
2.6 41 11.5%
 
3.1 63 17.7%
 
ValueCountFrequency (%) 
8.7 1 0.3%
 
8.2 4 1.1%
 
7.7 5 1.4%
 
7.2 9 2.5%
 
6.7 13 3.7%
 

winddir
Real number (ℝ≥0)

Distinct count28
Unique (%)7.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean160.3988764
Minimum10
Maximum399
Zeros0
Zeros (%)0.0%
Memory size2.9 KiB

Quantile statistics

Minimum10
5-th percentile20
Q150
median100
Q3280
95-th percentile360
Maximum399
Range389
Interquartile range (IQR)230

Descriptive statistics

Standard deviation122.1983075
Coefficient of variation (CV)0.7618401717
Kurtosis-1.417632276
Mean160.3988764
Median Absolute Deviation (MAD)113.9559715
Skewness0.4307059821
Sum57102
Variance14932.42636
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10. 95. 135. 245. 265. 399.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
50 27 7.6%
 
80 26 7.3%
 
60 22 6.2%
 
260 21 5.9%
 
40 21 5.9%
 
90 18 5.1%
 
20 17 4.8%
 
250 17 4.8%
 
30 17 4.8%
 
280 16 4.5%
 
Other values (18) 154 43.3%
 
ValueCountFrequency (%) 
10 10 2.8%
 
20 17 4.8%
 
30 17 4.8%
 
40 21 5.9%
 
50 27 7.6%
 
ValueCountFrequency (%) 
399 8 2.2%
 
360 15 4.2%
 
350 6 1.7%
 
340 12 3.4%
 
330 7 2.0%
 

cloudcoverage
Real number (ℝ≥0)

ZEROS
Distinct count40
Unique (%)11.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.15449438
Minimum0
Maximum100
Zeros146
Zeros (%)41.0%
Memory size2.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median14
Q340
95-th percentile75.25
Maximum100
Range100
Interquartile range (IQR)40

Descriptive statistics

Standard deviation30.24913002
Coefficient of variation (CV)1.202533812
Kurtosis-0.6819635569
Mean25.15449438
Median Absolute Deviation (MAD)25.80120881
Skewness0.8997390538
Sum8955
Variance915.0098671
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 19.5 20.5 21.5 38.5 40.5 71.5 75.5 100. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 146 41.0%
 
75 50 14.0%
 
20 44 12.4%
 
40 21 5.9%
 
14 6 1.7%
 
1 5 1.4%
 
21 5 1.4%
 
2 4 1.1%
 
30 4 1.1%
 
4 3 0.8%
 
Other values (30) 68 19.1%
 
ValueCountFrequency (%) 
0 146 41.0%
 
1 5 1.4%
 
2 4 1.1%
 
3 2 0.6%
 
4 3 0.8%
 
ValueCountFrequency (%) 
100 3 0.8%
 
99 2 0.6%
 
88 2 0.6%
 
85 2 0.6%
 
84 1 0.3%
 
Distinct count9
Unique (%)2.5%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
Clear : clear sky
168
Clouds : few clouds
67
Clouds : broken clouds
61
Clouds : scattered clouds
34
Rain : light rain
 
12
Other values (4)
 
14
ValueCountFrequency (%) 
Clear : clear sky 168 47.2%
 
Clouds : few clouds 67 18.8%
 
Clouds : broken clouds 61 17.1%
 
Clouds : scattered clouds 34 9.6%
 
Rain : light rain 12 3.4%
 
Clouds : overcast clouds 8 2.2%
 
Rain : light intensity shower rain 2 0.6%
 
Mist : mist 2 0.6%
 
Snow : light snow 2 0.6%
 

Length

Max length34
Mean length19.21629213
Min length11
ValueCountFrequency (%) 
Lowercase_Letter 21 77.8%
 
Uppercase_Letter 4 14.8%
 
Space_Separator 1 3.7%
 
Other_Punctuation 1 3.7%
 
ValueCountFrequency (%) 
Latin 25 92.6%
 
Common 2 7.4%
 
ValueCountFrequency (%) 
ASCII 27 100.0%
 

humidity_abs
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count48
Unique (%)13.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.002970786517
Minimum0.0014
Maximum0.007
Zeros0
Zeros (%)0.0%
Memory size2.9 KiB

Quantile statistics

Minimum0.0014
5-th percentile0.0015
Q10.0019
median0.0028
Q30.0037
95-th percentile0.0052
Maximum0.007
Range0.0056
Interquartile range (IQR)0.0018

Descriptive statistics

Standard deviation0.001186231077
Coefficient of variation (CV)0.3992986606
Kurtosis0.5310711166
Mean0.002970786517
Median Absolute Deviation (MAD)0.0009640891302
Skewness0.8511770785
Sum1.0576
Variance1.407144168e-06
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0014 0.00185 0.00445 0.00545 0.007 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.0016 21 5.9%
 
0.0015 20 5.6%
 
0.0022 17 4.8%
 
0.0017 17 4.8%
 
0.0018 17 4.8%
 
0.0034 15 4.2%
 
0.0033 15 4.2%
 
0.0032 15 4.2%
 
0.0024 14 3.9%
 
0.0025 13 3.7%
 
Other values (38) 192 53.9%
 
ValueCountFrequency (%) 
0.0014 7 2.0%
 
0.0015 20 5.6%
 
0.0016 21 5.9%
 
0.0017 17 4.8%
 
0.0018 17 4.8%
 
ValueCountFrequency (%) 
0.007 1 0.3%
 
0.0068 1 0.3%
 
0.0067 1 0.3%
 
0.0066 1 0.3%
 
0.0065 2 0.6%
 

enthalpy
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count32
Unique (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.35393258
Minimum0
Maximum32
Zeros2
Zeros (%)0.6%
Memory size2.9 KiB

Quantile statistics

Minimum0
5-th percentile2.75
Q18
median12
Q316
95-th percentile24
Maximum32
Range32
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.408050503
Coefficient of variation (CV)0.5187053159
Kurtosis0.1770665502
Mean12.35393258
Median Absolute Deviation (MAD)5.027490216
Skewness0.489148045
Sum4398
Variance41.06311125
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 7.5 14.5 21.5 32. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
14 28 7.9%
 
11 27 7.6%
 
13 26 7.3%
 
12 23 6.5%
 
10 21 5.9%
 
8 19 5.3%
 
9 18 5.1%
 
5 17 4.8%
 
17 17 4.8%
 
21 16 4.5%
 
Other values (22) 144 40.4%
 
ValueCountFrequency (%) 
0 2 0.6%
 
1 6 1.7%
 
2 10 2.8%
 
3 11 3.1%
 
4 11 3.1%
 
ValueCountFrequency (%) 
32 1 0.3%
 
31 3 0.8%
 
30 2 0.6%
 
29 2 0.6%
 
28 1 0.3%
 

dewpoint
Real number (ℝ)

HIGH CORRELATION
Distinct count314
Unique (%)88.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-3.994129213
Minimum-12.96
Maximum8.79
Zeros1
Zeros (%)0.3%
Memory size2.9 KiB

Quantile statistics

Minimum-12.96
5-th percentile-11.6575
Q1-8.585
median-3.95
Q30.0025
95-th percentile4.615
Maximum8.79
Range21.75
Interquartile range (IQR)8.5875

Descriptive statistics

Standard deviation5.170022647
Coefficient of variation (CV)-1.294405456
Kurtosis-0.7990097864
Mean-3.994129213
Median Absolute Deviation (MAD)4.379770389
Skewness0.148028125
Sum-1421.91
Variance26.72913417
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-12.96 2.335 8.79 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-10.73 3 0.8%
 
-1.1 3 0.8%
 
-4.44 3 0.8%
 
-6.81 2 0.6%
 
-2.2 2 0.6%
 
-2.14 2 0.6%
 
-5.87 2 0.6%
 
1.69 2 0.6%
 
-4.57 2 0.6%
 
-9.64 2 0.6%
 
Other values (304) 333 93.5%
 
ValueCountFrequency (%) 
-12.96 1 0.3%
 
-12.9 1 0.3%
 
-12.81 1 0.3%
 
-12.56 1 0.3%
 
-12.55 1 0.3%
 
ValueCountFrequency (%) 
8.79 1 0.3%
 
8.38 1 0.3%
 
8.17 1 0.3%
 
7.9 1 0.3%
 
7.85 1 0.3%
 

density
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count83
Unique (%)23.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.267502809
Minimum1.218
Maximum1.309
Zeros0
Zeros (%)0.0%
Memory size2.9 KiB

Quantile statistics

Minimum1.218
5-th percentile1.23
Q11.254
median1.268
Q31.281
95-th percentile1.3
Maximum1.309
Range0.091
Interquartile range (IQR)0.027

Descriptive statistics

Standard deviation0.02003525043
Coefficient of variation (CV)0.01580686866
Kurtosis-0.2999573909
Mean1.267502809
Median Absolute Deviation (MAD)0.01589041788
Skewness-0.163961258
Sum451.231
Variance0.0004014112597
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.218 1.2425 1.2605 1.2765 1.309 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1.266 12 3.4%
 
1.27 12 3.4%
 
1.268 11 3.1%
 
1.262 11 3.1%
 
1.276 9 2.5%
 
1.256 9 2.5%
 
1.271 9 2.5%
 
1.258 8 2.2%
 
1.273 8 2.2%
 
1.263 7 2.0%
 
Other values (73) 260 73.0%
 
ValueCountFrequency (%) 
1.218 1 0.3%
 
1.219 2 0.6%
 
1.221 1 0.3%
 
1.222 4 1.1%
 
1.223 1 0.3%
 
ValueCountFrequency (%) 
1.309 1 0.3%
 
1.308 2 0.6%
 
1.307 1 0.3%
 
1.305 3 0.8%
 
1.304 5 1.4%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

datetimetemperaturehumiditypressurewindspeedwinddircloudcoverageweather_descriptionhumidity_absenthalpydewpointdensity
02020-03-19 11:45:5013.917110254.1290.075Clouds : broken clouds0.007032.08.791.224
12020-03-19 12:50:4813.816710253.6330.075Clouds : broken clouds0.006531.07.851.225
22020-03-19 13:46:4613.487110253.6350.075Clouds : broken clouds0.006831.08.381.226
32020-03-19 14:52:2113.656710253.1360.040Clouds : scattered clouds0.006530.07.701.226
42020-03-19 15:52:4013.267110253.6340.040Clouds : scattered clouds0.006730.08.171.227
52020-03-19 16:53:3812.557110253.1350.075Clouds : broken clouds0.006429.07.491.230
62020-03-19 17:52:0011.457610253.130.075Clouds : broken clouds0.006428.07.431.235
72020-03-19 18:47:1910.157610253.110.075Clouds : broken clouds0.005825.06.181.241
82020-03-19 19:52:549.508110263.640.075Clouds : broken clouds0.005925.06.471.244
92020-03-19 20:48:208.927610262.180.075Clouds : broken clouds0.005423.04.991.247

Last rows

datetimetemperaturehumiditypressurewindspeedwinddircloudcoverageweather_descriptionhumidity_absenthalpydewpointdensity
3462020-04-03 03:52:486.687010057.2280.075Rain : light rain0.004217.01.681.258
3472020-04-03 04:54:446.287510065.1260.077Rain : light rain0.004417.02.261.259
3482020-04-03 05:50:456.167010064.1280.075Clouds : broken clouds0.004117.01.191.260
3492020-04-03 06:47:055.747210077.2290.020Clouds : few clouds0.004116.01.181.262
3502020-04-03 07:49:065.336010086.7300.020Clouds : few clouds0.003314.0-1.701.265
3512020-04-03 08:53:055.806010106.7280.020Clouds : few clouds0.003414.0-1.261.262
3522020-04-03 09:50:017.175610107.7280.020Clouds : few clouds0.003516.0-0.921.256
3532020-04-03 11:00:258.065210118.2280.040Clouds : scattered clouds0.003517.0-1.101.252
3542020-04-03 11:53:538.545210126.2290.075Clouds : broken clouds0.003618.0-0.651.250
3552020-04-03 12:56:509.474910126.7290.075Clouds : broken clouds0.003619.0-0.611.246